摘要 :
Genomic prediction is widely used to select candidates
for breeding. Size and composition of the reference
population are important factors influencing prediction
accuracy. In Holstein dairy cattle, large reference
populations...
展开
Genomic prediction is widely used to select candidates
for breeding. Size and composition of the reference
population are important factors influencing prediction
accuracy. In Holstein dairy cattle, large reference
populations are used, but this is difficult to achieve in
numerically small breeds and for traits that are not
routinely recorded. The prediction accuracy is usually
estimated using cross-validation, requiring the full data
set. It would be useful to have a method to predict the
benefit of multibreed reference populations that does
not require the availability of the full data set. Our
objective was to study the effect of the size and breed
composition of the reference population on the accuracy
of genomic prediction using genomic BLUP and
Bayes R. We also examined the effect of trait heritability
and validation breed on prediction accuracy. Using
these empirical results, we investigated the use of a
formula to predict the effect of the size and composition
of the reference population on the accuracy of genomic
prediction. Phenotypes were simulated in a data set
containing real genotypes of imputed sequence variants
for 22,752 dairy bulls and cows, including Holstein, Jersey,
Red Holstein, and Australian Red cattle. Different
reference populations were constructed, varying in size
and composition, to study within-breed, multibreed,
and across-breed prediction. Phenotypes were simulated
varying in heritability, number of chromosomes,
and number of quantitative trait loci. Genomic prediction
was carried out using genomic BLUP and Bayes R.
We used either the genomic relationship matrix (GRM)
to estimate the number of independent chromosomal
segments and subsequently to predict accuracy, or the
accuracies obtained from single-breed reference populations
to predict the accuracies of larger or multibreed
reference populations. Using the GRM overestimated
the accuracy; this overestimation was likely due to close
relationships among some of the reference animals.
Consequently, the GRM could not be used to predict
the accuracy of genomic prediction reliably. However,
a method using the prediction accuracies obtained by
cross-validation using a small, single-breed reference
population predicted the accuracy using a multibreed
reference population well and slightly overestimated
the accuracy for a larger reference population of the
same breed, but gave a reasonably close estimate of the
accuracy for a multibreed reference population. This
method could be useful for making decisions regarding
the size and composition of the reference population.
收起
摘要 :
Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multita...
展开
Genomic prediction is applicable to individuals of different breeds. Empirical results to date, however, show limited benefits in using information on multiple breeds in the context of genomic prediction. We investigated a multitask Bayesian model, presented previously by others, implemented in a Bayesian stochastic search variable selection (BSSVS) model. This model allowed for evidence of quantitative trait loci (QTL) to be accumulated across breeds or for both QTL that segregate across breeds and breed-specific QTL. In both cases, single nucleotide polymorphism effects were estimated with information from a single breed. Other models considered were a single-trait and multitrait genomic residual maximum likelihood (GREML) model, with breeds considered as different traits, and a single-trait BSSVS model. All single-trait models were applied to each of the 2 breeds separately and to the pooled data of both breeds. The data used included a training data set of 6,278 Holstein and 722 Jersey bulls, as well as 374 Jersey validation bulls. All animals had genotypes for 474,773 single nucleotide polymorphisms after editing and phenotypes for milk, fat, and protein yields. Using the same training data, BSSVS consistently outperformed GREML. The multitask BSSVS, however, did not outperform single-trait BSSVS, which used pooled Holstein and Jersey data for training. Thus, the rigorous assumption that the traits are the same in both breeds yielded a slightly better prediction than a model that had to estimate the correlation between the breeds from the data. Adding the Holstein data significantly increased the accuracy of the single-trait GREML and BSSVS in predicting the Jerseys for milk and protein, in line with estimated correlations between the breeds of 0.66 and 0.47 for milk and protein yields, whereas only the BSSVS model significantly improved the accuracy for fat yield with an estimated correlation between breeds of only 0.05. The relatively high genetic correlations for milk and protein yields, and the superiority of the pooling strategy, is likely the result of the observed admixture between both breeds in our data. The Bayesian model was able to detect several QTL in Holsteins, which likely enabled it to outperform GREML. The inability of the multitask Bayesian models to outperform a simple pooling strategy may be explained by the fact that the pooling strategy assumes equal effects in both breeds; furthermore, this assumption may be valid for moderate- to large-sized QTL, which are important for multibreed genomic prediction.
收起
摘要 :
Genomic selection (GS) is routinely applied to many purebreds and lines of farm species. However, this method can be extended to predictions across purebreds as well as for crossbreds. This is useful for swine and poultry, for whi...
展开
Genomic selection (GS) is routinely applied to many purebreds and lines of farm species. However, this method can be extended to predictions across purebreds as well as for crossbreds. This is useful for swine and poultry, for which selection in nucleus herds is typically performed on purebred animals, whereas the commercial product is the crossbred animal. Single-step genomic BLUP (ssGBLUP) is a widely applied method that can explore the recently developed algorithm for proven and young (APY). The APY allows for greater efficiency in computing BLUP solutions by exploiting the theory of limited dimensionality of genomic information and chromosome segments (Me). This study investigates the predictivity as a proxy for accuracy across and within 2 purebred pig lines and their crosses, under the application of APY in ssGBLUP setup, and different levels of Me overlapping across populations. The data consisted of approximately 210k phenotypic records for 2 traits (T1 and T2) with moderate heritability. Genotypes for 43k SNP markers were available for approximately 46k animals, from which 26k and 16k belong to 2 pure lines (L1 and L2), and approximately 4k are crossbreds. The complete pedigree had more than 720k animals. Different multivariate ssGBLUP models were applied, either with the regular or APY inverse of the genomic relationship matrix (G). The models included a standard bivariate animal model with 3 lines evaluated as 1 joint line, and for each trait individually, a 3-trait animal model with each line treated as a separate trait. Both models provided the same predictivity across and within the lines. Using either of the pure lines data as a training set resulted in a similar predictivity for the crossbreed animals (0.18 to 0.21). Across-line predictive ability was limited to less than half of the maximum predictivity for each line (L1T1 0.33, L1T2 0.25, L2T1 0.35, L2T2 0.36). For crossbred predictions, APY performed equivalently to regular G inverse when the number of core animals was equal to the number of eigenvalues explaining between 98% and 99% of the variance of G (4k to 8k) including all lines. Predictivity across the lines is achievable because of the shared Me between them. The number of those shared segments can be obtained via eigenvalue decomposition of genomic information available for each line.
收起
摘要 :
Genomic prediction involves characterization of chromosome fragments in a training population to predict merit. Confidence in the predictions relies on their evaluation in a validation population. Many commercial animals are multi...
展开
Genomic prediction involves characterization of chromosome fragments in a training population to predict merit. Confidence in the predictions relies on their evaluation in a validation population. Many commercial animals are multibreed (MB) or crossbred, but seedstock populations tend to be purebred (PB). Training in MB allows selection of PB for crossbred performance. Training in PB to predict MB performance quantifies the potential for across-breed genomic prediction. Efficiency of genomic selection was evaluated for a trait with heritability 0.5 simulated using actual SNP genotypes. The PB population had 1,086 Angus animals, and the MB population had 924 individuals from 8 sire breeds. Phenotypic values were simulated for scenarios including 50, 100, 250, or 500 additive QTL randomly selected from 50K SNP panels. Panels containing various numbers of SNP, including or excluding the QTL, were used in the analysis. A Bayesian model averaging method was used to simultaneously estimate the effects of all markers on the panels in MB (or PB) training populations. Estimated effects were utilized to predict genomic merit of animals in PB (or MB) validation populations. Correlations between predicted and simulated genomic merit in the validation population was used to reflect predictive ability. Panels that included QTL were able to account for 50% or more of the within-breed genetic variance when the trait was influenced by 50 QTL. The predictive power eroded as the number of QTL increased. Panels that composed the QTL and no other markers were able to account for 50% or more genetic variance even with 500 QTL. Panels that included genomic markers as well as QTL had less predictive power as the number of markers on the panel was increased. Panels that excluded the QTL and relied on markers in linkage disequilibrium (LD) to predict QTL effects performed more poorly than marker panels with QTL. Real-life situations with 50K panels that excluded the QTL could account for no more than 20% genetic variation for 50 QTL and less than 10% for 500 QTL. The difference between panels that included and excluded QTL indicates that the predictive ability of existing panels is limited by their LD. Training in PB to predict MB tended to be more predictive than training in MB to predict PB due to greater average levels of LD in PB than in MB populations. Improved across breed prediction of genomic merit will require panels with more than 50,000 markers.
收起
摘要 :
The aim of this study was to assess the accuracy of genomic predictions for 19 traits including feed efficiency, growth, and carcass and meat quality traits in beef cattle. The 10 181 cattle in our study had real or imputed genoty...
展开
The aim of this study was to assess the accuracy of genomic predictions for 19 traits including feed efficiency, growth, and carcass and meat quality traits in beef cattle. The 10 181 cattle in our study had real or imputed genotypes for 729 068 SNP although not all cattle were measured for all traits. Animals included Bos taurus, Brahman, composite, and crossbred animals. Genomic EBV (GEBV) were calculated using 2 methods of genomic prediction [BayesR and genomic BLUP (GBLUP)] either using a common training dataset for all breeds or using a training dataset comprising only animals of the same breed. Accuracies of GEBV were assessed using 5-fold cross-validation. The accuracy of genomic prediction varied by trait and by method. Traits with a large number of recorded and genotyped animals and with high heritability gave the greatest accuracy of GEBV. Using GBLUP, the average accuracy was 0.27 across traits and breeds, but the accuracies between breeds and between traits varied widely. When the training population was restricted to animals from the same breed as the validation population, GBLUP accuracies declined by an average of 0.04. The greatest decline in accuracy was found for the 4 composite breeds. The BayesR accuracies were greater by an average of 0.03 than GBLUP accuracies, particularly for traits with known genes of moderate to large effect mutations segregating. The accuracies of 0.43 to 0.48 for IGF-I traits were among the greatest in the study. Although accuracies are low compared with those observed in dairy cattle, genomic selection would still be beneficial for traits that are hard to improve by conventional selection, such as tenderness and residual feed intake. BayesR identified many of the same quantitative trait loci as a genomewide association study but appeared to map them more precisely. All traits appear to be highly polygenic with thousands of SNP independently associated with each trait
收起
摘要 :
An equivalent model for multibreed variance covariance estimation is presented. It considers the additive case including or not the segregation variances. The model is based on splitting the additive genetic values in several inde...
展开
An equivalent model for multibreed variance covariance estimation is presented. It considers the additive case including or not the segregation variances. The model is based on splitting the additive genetic values in several independent parts depending on their genetic origin. For each part, it expresses the covariance between relatives as a partial numerator relationship matrix times the corresponding variance component. Estimation of fixed effects, random effects or variance components provided by the model are as simple as any model including several random factors. We present a small example describing the mixed model equations for genetic evaluations and two simulated examples to illustrate the Bayesian variance component estimation.
收起